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ABSTRACT 

This report updates and summarizes what is known from 
research on the effects of test accommodations for students with disabilities 
and also provides direction to the design of critically needed future 
research on accommodations. A review was conducted of 46 empirical research 
studies on accommodations published from 1999 through 2001. The majority of 
the studies. involved criterion referenced tests used for state 
accountability. Findings across at least four studies from the review 
indicate three accommodations showed a positive effect on student test 
scores: computer administration; oral presentation; and extended time. 
However, additional studies on each of these accommodations also found no 
significant effect on scores or alternations in item comparability. All of 
the meta analyses of accommodated conditions found a positive effect on 
scores, and all of the studies examining differential item functioning (DIF) 
under accommodated conditions found some items that exhibited DIF. The 
analysis also indicates a need for the clear definition of the constructs 
tested, greater clarity in the accommodations needed by individual students, 
and exploration of the desirability and perceived usefulness of 
accommodations by students themselves. Several appendices include a summary 
of the types of accommodations and research results . (Contains 54 
references . ) (CR) 
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Executive Summary 



The enactment of the No Child Left Behind Act of 2001 brings an urgency to knowing whether 
the use of certain accommodations threatens test validity or score comparability, and whether 
specific accommodations are useful for individual students. This report is intended to update 
and summarize what we know from research on the effects of accommodations, and also to 
provide direction to the design of critically needed future research on accommodations. We 
found 46 empirical research studies on accommodations published from 1999 through 2001. 
These studies had the following characteristics: 

Purpose. The primary purpose of the 1999-2001 accommodations research was to 
determine the effects of accommodations use on the large-scale test scores of students 
with disabilities. 

Types of assessment, content areas, and accommodations. The majority of the studies 
involved criterion referenced tests used for state accountability. Mathematics was as- 
sessed in half of the studies, and reading/language arts was assessed in about one third. 
Presentation accommodations were investigated most frequently, with “oral presentation” 
selected for analysis in nearly half of the studies. 

Participants. The number of participants ranged from 3 to nearly 21,000. The largest 
number of studies included elementary school students, with the greatest number ex- 
amining accommodation use by fourth graders. Twenty-seven studies documented the 
participants’ types of disabilities; in those studies, learning and cognitive disabilities 
were most frequently investigated. 

Research design. The studies were identified as representing one of four group research 
designs, a single subject research design, or a non-experimental or other design. Over 
one third of the studies applied non-experimental or other designs to the study of ac- 
commodations effects. 

Findings. Despite the variability in the characteristics of the accommodations research 
conducted from 1999-2001, the findings point to further directions for research. In terms 
of results, three accommodations showed a positive effect on student test scores across 
at least four studies: computer administration, oral presentation, and extended time. 
However, additional studies on each of these accommodations also found no significant 
effect on scores or alterations in item comparability. AU of the meta analyses of accom- 
modated conditions found a positive effect on scores, and all of the studies examining 
differential item functioning (DIF) under accommodated conditions found some items 
that exhibited DIF. 




Limitations. The researchers of the accommodations studies often identified limitations 



in their studies. These are important limitations that need to be given more attention in the 
future. Among the frequently cited limitations were: unknown variations among students 
included in the study, sample sizes too small to provide adequate statistical support, and 
nonstandard administration of the accommodations across proctors and schools. These 
limitations and other considerations led researchers to recommend replicating the re- 
search for validation and generalization, as weU as investigating associations to specific 
disabilities. It was also recommended that more detailed non-experimental studies be 
conducted to provide richer data, increasing researcher control of the testing process, 
and studying larger groups of students. 

Important overall observations from our analysis include a need for the clear definition of the 
constructs tested, greater clarity in the accommodations needed by individual students, and 
exploration of the desirability and perceived usefulness of accommodations by students them- 
selves — the “end users” of assessments. Future research should also explore the effects of assess- 
ment design and standardization to see whether incorporating new item designs and incorporating 
more flexible testing conditions (i.e., universal design) reduces the need for accommodations 
while facilitating measurement of the critical constructs for students with disabilities. 
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Overview 



One of the most viable ways to increase the participation of students with disabilities in assess- 
ments is through the use of accommodations. Accommodations are defined by Thurlow and 
Bolt (2001) as “changes in assessment materials or procedures that address aspects of students’ 
disabilities that may interfere with the demonstration of their knowledge and skills on standard- 
ized tests. Accommodations attempt to eliminate barriers to meaningful testing, thereby allowing 
for the participation of students with disabilities in state and district assessments” (p. 1). Ac- 
commodations are further defined by Tindal and Fuchs as “changes in standardized assessment 
conditions introduced to level the playing field for students by removing the construct-irrelevant 
variance created by their disabilities. Vahd accommodations produce scores for students with 
disabilities that measure the same attributes as standard assessments measured in nondisabled 
individuals” (p. 8). 

All states now have accommodation policies (Thurlow, Lazarus, Thompson, & Robey, 2002), 
and nearly sixty percent of states keep track of the use of accommodations during state assess- 
ments — about half of these report an increase in use by students (Thompson & Thurlow, 1999). 
However, there continues to be only limited consensus on what constitutes an “appropriate” 
accommodation as states grapple with decisions about how to score and report the use of ac- 
commodations that some consider “nonstandard” or “nonscorable.” 

There is a critical and ongoing need for increased research on the effects of the use of accom- 
modations on the psychometric characteristics of assessment results. With the enactment of the 
No Child Left Behind Act of 2001 has come an urgency to know whether the use of certain ac- 
commodations threatens test vahdity or score comparability. Similarly, there is a need to know 
whether specific accommodations are useful for individual students (Thurlow, McGrew, Tindal, 
Thompson, Ysseldyke, & Elliott, 2000). The amount of assessment research has increased dra- 
matically in recent years. Tindal and Fuchs (1999) found an increase from 1 1 studies published 
from 1990 through 1992, to 29 studies published from 1996 through 1998. The current study, 
which focuses on studies published from 1999 through 2001, includes 46 empirical research 
studies on accommodations (see Table 1). 



Table 1. Number of Accommodations Research Studies Published from 1990 Through 2001 



Years 


Number of Studies 


1990 through 1992 


11 


1993 through 1995 


18 


1996 through 1998 


29 


1999 through 2001 


46 
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This increase is partly due to support from the U.S. Department of Education for research on a 
variety of issues related to the participation of students with disabUities in large-scale assess- 
ments. Federal support has come from both the Office of Special Education Programs (OSEP) 
and the Office of Educational Research and Improvement (OERI). Some states and test publish- 
ers have also recently supported additional research efforts. 

The purpose of this paper is to summarize several components of the research on the effects of 
test accommodations published from 1999 through 2001, including: type of assessment, con- 
tent area assessed, number of research participants, types of disabilities included in the sample, 
grade-level of the participants, research design, research findings, limitations of the study, and 
recommendations for fumre research. 



Method L ^ ^ s: ^ r — « — 

Four major databases were searched to identify research on test accommodations published 
from 1999 through 2001: ERIC, Psychinfo, Educational Abstracts, and Digital Dissertations. 
Research papers were also obtained at major conferences. Additional resources for identifying 
research included: 

• Behavioral Research and Teaching at the University of Oregon: http://brt.uoregon.edu/ 

• Education Policy Analysis Archives: http://epaa.asu.edu 

• National Center for Research on Evaluation, Standards, and Smdent Testing: http: 
//\vww.cse.ucla.edu/ 



• Wisconsin Center for Educational Research: http://www.wcer.wisc.edu/testacc/ 

Several search terms were used. The terms were varied systematically to ensure the identifica- 
tion of aU research on changes in testing, published from 1999 through 2001. Search terms 
included: 

• accommodation 

• test adaptation 

• test changes 

• test modifications 
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• test accommodations 



• state testing accommodations 

• standards-based testing accommodations 

• large-scale testing accommodations 

A decision was made to limit the selection of publications to empirical research. Included within 
this realm are studies with samples consisting of preschool, kindergarten through high school, 
and postsecondary students. The focus of the empirical research was not limited only to large- 
scale testing, but also included studies that incorporated intelligence tests and curriculum-based 
measures (CBM). We decided to focus on testing accommodations as opposed to instructional 
accommodations, although there is some overlap between these purposes in the literature. We 
did not include any conceptual or opinion pieces in this analysis. 



As a result of the extensive search effort described above, 46 research studies, published between 
1999 and 2001, were selected for this analysis. All of the studies are empirical, that is, they 
include an analysis of data. Nineteen of the studies were published in journals, 12 in reports, 
12 in papers presented at conferences, and 3 were dissertations. The researchers and references 
to each publication are listed in Appendix A. 

Purpose of the Research 

The primary purpose of the accommodations research conducted over the past three years has 
been to determine the effect of accommodations use on the large-scale test scores of students 
with disabilities (see Table 2). Over half of the studies investigated whether the use of accom- 
modations gave the test scores of students with disabilities a differential boost, that is, the ac- 
commodation had a greater effect on the scores of students with disabilities than on the scores 
of students without disabilities. The second most common purpose was to investigate the effects 
of accommodations on test score vahdity. The purpose of seven of the studies was to analyze 
institutional factors, teacher judgment, or student desirability of accommodation use. Three of 
these studies also examined the effect of accommodation use on test scores. Finally, five studies 
described as their purpose the examination of patterns of errors across items or tests. Though we 
have categorized the purposes into only four primary groups, the results of this analysis show 
wide variation in research designs, participants, and assessments. Appendix A summarizes the 
purpose of each study. 



Results 
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Table 2. Purposes of Reviewed Research 



Research Purpose 


Number of Studies 


Determine the effect of the use of accommodations on test scores of 
students with disabilities 


24 


Investigate the effects of accommodations on test score validity 


10 


Study institutional factors, teacher judgment, or student desirability of 
accommodation use 


7 


Examine patterns of errors across items or tests 


5 



Type of Assessment 

Three primary types of assessments were used across the 46 studies selected for this analysis. 
These included norm-referenced and other standardized tests, state criterion-referenced tests or 
performance assessments, and school or district-designed tests (see Table 3). Three of the studies 
used assessments from more than one category. Seventeen studies used a total of 10 different 
norm-referenced and other standardized tests including: National Assessment of Educational 
Progress, Test of General Educational Development, Stanford Achievement Test, Iowa Test of 
Basic Skills, Peabody Picture Vocabulary Test, CaUfomia Achievement Test, Terra Nova, Scho- 
lastic Abihty Test for Adults, Psychoeducational Profile, and MiUer Analogies Test. 

Studies using criterion-referenced tests consisted primarily of large-scale state accountabihty 
measmes. These assessments, used in 21 of the studies selected for this analysis, crossed a total 
of 13 states: Indiana, Kansas, Kentucky, Massachusetts, Maryland, Minnesota, Missouri, New 
York, Oregon, Rhode Island, South Carohna, Washington, and Wisconsin. Fovu- studies were 
conducted in each of two states: Oregon and Wisconsin. 

Six states used school or district-designed tests. These included performance assessments, cm- 
riculum-based measmes, and math computation tests. The “other” category consisted of three 
surveys, a checkhst, and two studies that included an analysis of multiple investigations (e.g., 
meta analysis). Appendix B fists the type of assessments used in each study. 



Table 3. Types of Assessment in Reviewed Research 



Type of Assessment 


Number of Studies* 


Norm-referenced and Other Standardized Tests 


17 


State Criterion-referenced Tests or Performance Assessments 


21 


School or District-designed Tests 


6 


Other 


6 



* Some studies had assessments that fit into more than one category. 



4 



NCEO 



Content Area Assessed 



Researchers used assessments across five basic academic content areas: reading/language arts, 
writing, mathematics, science, and social studies. Mathematics was assessed in half of the stud- 
ies, while reading/language arts was assessed in 16 studies (see Table 4). The studies categorized 
as “no specific content area” included surveys, meta analyses of several studies, and general 
academic assessments in which specific content was not specified. Most of the research focused 
on a single content area, while some of the larger studies addressed two or more content areas 
(see Appendix C). 



Table 4. Content Areas Assessed in Reviewed Research 



Content Areas Assessed 


Number of Studies* 


Mathematics 


23 


Reading/Language Arts 


16 


Science 


9 


VWiting 


7 


Social Studies 


3 


No Specific Content Area 


9 



* Some studies assessed more than one content area. 



Type of Accommodation 

Eleven types of accommodations were investigated in at least two of the 46 research studies se- 
lected for this analysis (see Table 5). These accommodations were categorized into four groups: 
presentation, response, setting, and timing/scheduling (see Appendix D). In addition, 14 of the 
studies investigated the effects of multiple accommodations. 

Presentation accommodations were investigated most frequently, with “oral presentation” ex- 
amined in 22 studies. Other presentation accommodations included computer administration, 
simplified language, and large print. Timing and scheduling accommodations were the next most 
frequently investigated, with extended time analyzed in 17 studies. Other timing and scheduling 
accommodations studied included testing over multiple days and the use of frequent breaks. Re- 
sponse accommodations followed and included dictated response, use of a word processor, and 
calculator use. Finally, use of a separate or individual setting was investigated in five studies. 

The 19 studies listed in the “other” category include accommodations investigated in only one 
study. These included: extra spacing, assistive devices, repeated directions, verbal encourage- 
ment, cueing, interpreter, and support in understanding directions. In addition, survey research 
not aimed at investigating the effects of a particular accommodation was recorded as “other.” 
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Table 5. Types of Accommodation in Reviewed Research 



Type of Accommodation 


Number of Studies* 


Presentation: 


Oral Administration 


22 


Computer Administration 


8 


Simplified Language 


6 


Large Print 


2 


Response: 


Dictated Response 


6 


V\ford Processor 


4 


Calculator 


2 


Setting: 


Separate setting/small group 


5 


Timing/Scheduling: 


Extended Time 


17 


Multiple Day 


2 


Frequent Breaks 


2 


Multiple Accommodations 


14 


Other 


19 



* Some studies assessed more than one accommodation. 



Research Participants 

A description of the research participants, including the number in each study, percent of par- 
ticipants with disabUities, grade level or age, and types of disabilities are described in Appendix 
E. The number of participants ranged from 3 to nearly 21,000. Table 6 shows that about half of 
the studies included fewer than 200 participants. The largest study included 20,791 participants 
in Kentucky’s statewide assessment over a two-year period (Koretz & Hamilton, 2000). 



Table 6. Number of Participants in Reviewed Research 



Number of Participants 


Number of Studies 


1 -99 


11 


100-199 


12 


200 - 299 


6 


300 - 499 


4 


500 - 999 


2 


More than 1000 


7 


Number Unknown 


4 



Thirty-two studies documented the number of participants with disabilities (see Table 7). The 
percent of students with disabilities ranged from 6 percent to 100 percent of the total sample, 
with 7 studies including fewer than 25 % students with disabilities, 12 fewer than 50%, 8 fewer 
than 75%, and 5 studies including 75 to 100 percent of those with disabUities. Eight studies 
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included students with disabilities, but the number or percent was not documented. Six studies 
were reviews or other approaches that did not directly include participants. 



Table 7. Percent of Sample Consisting of Students with Disabilities in Reviewed Research 



Percent of Sample Consisting of Students 
with Disabilities 


Number of Studies 


1 - 24% 


7 


25 - 49% 


12 


50 - 74% 


8 


75-100% 


5 


Percent Unknown 


8 


Not Applicable 


6 



Participants in the research studies ranged in age from elementary school through postsecondary 
education (see Table 8). The largest number of studies (16) included elementary school students, 
with the greatest number of studies examining accommodation use by fourth graders. Eleven 
studies examined accommodation use by middle school students and six studies included high 
school students. Six studies looked at accommodation use across all grade levels, and three 
studied postsecondary students. Grade level information did not apply to or was not reported 
in four studies. 



Table 8. Summary of Participant Grade Levels in Reviewed Research 



Participant Grade Level 


Number of Studies 


Elementary (grades k-5) 


16 


Middle School (grades 6-8) 


11 


High School (grades 9-12) 


6 


Multiple Grade Levels (k-12) 


6 


Postsecondary 


3 


Not Applicable 


4 



Twenty-seven studies documented the types of disabilities experienced by participants (see Table 
9). Some of the studies included students with a variety of disabiUties, while others focused 
on a single disability (e.g., learning disability) or deficit area (e.g., reading). Three studies in- 
cluded students representing all disabiUty categories. Other than these three studies, only one 
included students with hearing impairments. Students with visual or physical disabilities were 
not included (see Appendix E). 
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Table 9. Types of Disabilities Experienced by Participants in Reviewed Research 



Type of Disability 


Number of Studies 


Learning Disability 


18 


Cognitive Disability (e.g., mental retardation) 


10 


Emotional/behavioral Disability 


9 


Communication Disability 


7 


Reading or Math Deficit 


5 


Other (includes physical and sensory 
disabilities, autism, attention deficit disorder, 
health impairments, and multiple disabilities) 


9 



Research Design 

The research designs used in the studies selected for this analysis (see Appendix F) were orga- 
nized according to types identified by Thurlow et al. (2000). These included four types of group 
research designs, a single subject research design, and a non-experimental or other design. The 
four group research designs are shown in Figure 1. 



Figure 1. Group Research Designs 



Design 1: Score comparability, interaction between presence of disability and 
accommodation use, equivalent test forms 





Disability Group 1 


Disability Group 2 


Non-Disability 
Group 1 


Non-Disability 
Group 2 


With 

Accommodation 


Test Form A 


Test Form B 


Test Form A 


Test Form B 


Without 

Accommodation 


Test Form B 


Test Form A 


Test Form B 


Test Form A 



Design 2: Score comparability, interaction between presence of disability and 
accommodation use, matched sample 





Disability Group 1 


Disability Group 2 


Non-Disability 
Group 1 


Non-Disability 
Group 2 


With 

Accommodation 


Test Form A 




Test Form A 




Without 

Accommodation 




Test Form A 




Test Form A 
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Design 3: Score comparability with accommodation use, equivalent test forms 





Disabiiity Group 1 


Disabiiity Group 2 


With Accommodation 


Test Form A 


Test Form B 


Without Accommodation 


Test Form B 


Test Form A 



Design 4: Score comparability with accommodation use, matched sample 





Disabiiity Group 1 


Disabiiity Group 2 


With Accommodation 


Test Form A 




Without Accommodation 




Test Form A 



The four group research designs shown in Figure 1 differ in terms of the controls that are included 
and the requirements for matching of the samples. For example, in Design 1 participants take 
two equivalent forms (A and B) of the same test - one with and the other without acconunoda- 
tions. Participants with and without disabilities who take the test without the acconunodations 
are drawn from the general testing population. Their scores are randomly selected from the 
total test sample of aU students who regularly take the version of Forms A and B. This design 
does not require that the sample from the disability and non-disability groups be exactly similar 
(i.e., matched) in important characteristics. 

In Design 2, only one form of the test is used, but students in the disability groups and the 
non-disability groups must be matched samples that are equivalent in important characteristics 
(e.g., age, disability category, acconunodation need, etc.). If the students are not matched on 
important characteristics, it is impossible to determine whether any differences between the score 
characteristics of the groups are due to the effects of the acconunodations or are attributable 
to differences in sample characteristics. If appropriate matching can take place, it is possible 
for participants with and without disabilities who take the test without acconunodations to be 
drawn from the general testing population. 

In Design 3, score comparability as a function of acconunodation use is examined only for 
students with disabilities. This design assumes (based on prior research) that the scores of 
participants who take the test without the acconunodation are comparable to the scores of 
participants without disabilities who take the test without acconunodations. Participants must 
take two versions of the same test - one with and one without accommodations. It is possible 
to draw participants who take the test without acconunodations from the general testing popula- 
tion taking both Form A and Form B. 
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Design 4 also examines score comparability only with participants with disabilities. The design, 
like Design 3, assumes (based on prior research) that the scores of students with disabUities who 
take the test without the accommodation are comparable to students without disabilities who 
take the test without the accommodation. This design requires that the participants be matched 
samples, but does not require equivalent forms (thereby allowing the use of just one test form). 
Without matched samples, it is impossible to determine whether differences in score character- 
istics are due to the effect of the accommodation or to differences in sample characteristics. 

Table 10 shows the number of studies that used each of the four group designs, as weU as the 
single subject and non-experimental and other designs. As is evident in this table, 12 of the 
accommodations studies in the past three years used Design 1, by far the most of any of the 
designs other than the non-experimental and other. The three studies that used single subject 
designs generally were intended to determine whether an accommodation is effective for indi- 
vidual students, and perhaps to search out the reason for the effects. These studies monitored 
student performance over time, along with the systematic introduction of various “treatments” 
that are considered to be accommodations. The 17 studies that were in the last category included 
a variety of methods such as meta-analyses, survey research, investigations of differential item 
functioning, post-hoc comparisons of scores, and methods for testing the fit of various models. A 
complete summary of the research designs used across aU studies is provided in Appendix F. 



Table 10. Research Designs in Reviewed Research 



Type of Research Design 


Number of 
Studies* 


Group Research Design 1 : Score comparability, interaction between presence of 
disability and accommodation use, equivalent test forms 


12 


Group Research Design 2: Score comparability, interaction between presence of 
disability and accommodation use, matched sample 


2 


Group Research Design 3: Score comparability with accommodation use, equivalent 
test forms 


7 


Group Research Design 4: Score comparability with accommodation use, matched 
sample 


6 


Single Subject Research Design 


3 


Non-experimental/Other 


17 



* All studies except one fit within only one of the categories. The one study in which multiple categories was 
coded as both Design 1 and Design 3 because not enough information was available to distinguish between the 
two. 



Research Results 

Results of the primary acconunodations studied in the 46 research studies that we reviewed, 
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using the designs described previously, are sununarized in Table 1 1. Sununaries of the research 
results of each study can be found in Appendix G. 



Table 11. Research Results from Reviewed Research 



Type of Accommodation 


Research Results 


Number of 
Studies 


Computer Administration (N = 9) 


Positive effect on scores 


4 


No significant effect on scores 


3 




Altered item comparability 


2 


Oral Presentation (N = 10) 


Positive effect on scores 


6 


No significant effect on scores 


1 




Altered item comparability 


2 




Did not alter item comparability 


1 


Extended Time (N = 7) 


Positive effect on scores 


4 


No significant effect on scores 


3 


Student-Paced Video (N = 1 ) 


Positive effect on scores 


1 


Examiner Familiarity (N = 1) 


Positive effect on scores 


1 


Type of Calculator (N = 1) 


No significant effect on scores 


1 


Simplified Language (N = 1) 


No significant effect on scores 


1 


Sign Language (N = 1) 


Altered item comparability 


1 


Meta Analyses of Accommodated 
Conditions (N = 5) 


Positive effect on scores 


5 


Differential Item Functioning (DIF) Linder 


Some items exhibited DIF under accommodated 




Accommodated Conditions (N = 6) 


conditions 


6 


Educator Beliefs (N = 4) 


Accommodation decisions are based on educator 




beliefs 


4 



Read Aloud. The greatest number of studies analyzed oral administration, often referred to 
as a “read aloud” accommodation. This accommodation was generally found to have positive 
effects on test scores of students with disabilities. For example, Calhoon et al. (2000) found 
that students performed better when a teacher read an assessment out loud than when standard 
paper/pencil administration was used. Helwig et al. (1999) found that students with low math 
proficiency, regardless of reading ability, scored better with oral presentation of a math test. 
Only one study did not result in a significant effect on test scores. Two additional studies found 
that oral administration altered item comparability, affecting the construct the assessment was 
intended to measure, while one other study did not result in alterations in item comparability. 

Computer Administration. The nine studies conducted on the use of computer administration 
as an accommodation showed varied results. Four of these studies found a positive effect on 
scores. For example. Brown and Augustine (2001) found that students with reading disabilities 
performed better using screen reading software than on paper/pencil tests. Burk (1999) also 
found that students performed better when they used a computer. Three studies resulted in no 
significant effects on scores. Two studies found that the use of computer administered tests 
altered item comparability, affecting the construct the assessment was intended to measure. 
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Extended Time and Midtiple Days. The use of extended time and the administration of tests 
over multiple days had a positive effect on the scores of students with disabilities in four studies. 
For example, Fuchs et al. (2000a) found that performance on problem-solving curriculum-based 
measures improved for students with learning disabilities using the extended time accommoda- 
tion. Similarly, Huesman and Frisbie (2000) found that students with learning disabilities made 
significantly greater gains on a norm-referenced test with extended time than students without 
learning disabilities. However, four additional studies did not find an effect of extended time or 
multiple-day testing on the scores of students with disabilities. 

Calculator Use, Encoding, and Examiner Familiarity. Positive effects were found on studies 
of calculator use, encoding, and examiner familiarity. For example, Fuchs et al. (2000a) found 
that performance on problem-solving curriculum based measures improved for students with 
learning disabilities using a calculator and using encoding (writing responses for students). 
Szarko (2000) found that examiner familiarity had a significant positive effect on the behavior 
and testing performance of children with autism. 

Meta Analyses and Other Studies. All of the meta analyses and studies of multiple accom- 
modations found positive effects of accommodations on test scores of students with disabilities. 
For example, a meta analysis by Chiu and Pearson (1999) found that the use of accommodations 
improved test scores of students with disabilities. Elliott, S. et al. (2001) found that performance 
with test accommodation packages resulted in moderate to large positive effects on test scores 
of students with disabilities. Similarly, Schulte et al. (2001) found that students with disabilities 
receiving accommodation packages other than extra time and oral presentation experienced a 
significant and differential impact of testing accommodations on math scores. 

Six studies resulted in Differential Item Functioning (DIF), which occurs when students 
equated on relevant ability (as defined by test performance), but representing different groups 
have statistically-defined different probabilities of responding correctly to test items. DIF is 
investigated by comparing item difficulty. For example, Lewis et al. (1999) found that for both 
reading and math assessments, only a few items exhibited differential item functioning (DIF) 
for participants under the reading accommodation conditions; more English language arts items 
than math items exhibited DIF. 

Finally, four studies found that the beliefs of teachers about acconunodation use influenced the 
selection of instructional and assessment accommodations for students with disabilities. 

Limitations Cited by Researchers 

Limitations of the research were discussed in 21 of the articles reviewed for this analysis. There 
were three primary limitations identified (see Table 12), including unknown variations among 
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students included in the study, sample sizes too small to provide adequate statistical support, 
and nonstandard administration of the accommodations across proctors and schools. Among 
the wide-ranging unknown variations cited by researchers were type of disability, the withhold- 
ing of typicaUy-used accommodations, and self- selection biases. Appendix H contains a list of 
limitations found across the studies. 



Table 12. Research Limitations 



Limitation 


Number of Studies (out of 21 
reporting limitations)* 


Unknown Variations Between Students 


11 


Small Sample Size 


9 


Nonstandard Administration Across Proctors and Schools 


6 



‘Some researchers cited limitations in more than one category. 



Recommendations for Future Research 

Recommendations for future research were made in 21 of the articles (see Table 13). The recom- 
mendations ranged from suggestions by 1 1 authors to replicate the research for validation and 
generalization, to investigating associations to specific disabilities, conducting more detailed 
non-experimental studies to provide richer data, increasing researcher control of the testing 
process, and studying larger groups of students. These recommendations are described further 
in Appendix H. 



Table 13. Recommendations for Future Research 



Recommendations 


Number of Studies (out of 21 
listing recommendations)* 


Replicate Results for Validation and Generalization 


11 • 


Investigate Specific Disability Associations 


4 


Conduct More Detailed Non-experimental Studies to Provide Richer 
Data 


3 


Increase Researcher Control of Testing Process 


2 


Study Larger Sample 


1 



* Some researchers did not make recommendations; others made more than one recommendation. 
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Discussion and impiications for Future Research 



Several important observations are evident from the analysis of the 46 studies included in this 
synthesis report. These observations are not conclusive, but can provide direction for future 
research, policy, and practice. We discuss here some of primary observations from our analysis, 
with a discussion of implications for the future. 

Over half of the studies examined the effects of the use of accommodations on test scores. While 
this purpose continues to be important, additional studies are needed that investigate the effects 
of accommodations under much more carefully defined conditions. Specifically, there is a need 
for clear definition of the constructs tested - not just for the test in general, but for each and 
every item. There needs to be corroborating information that the intended construct measured 
does indeed get assessed by each item. At the same time, greater clarity in the accommodations 
needed by individual students needs to be added - independent ways of measuring whether each 
student who participates in an acconmlodation study actually needs the accommodation being 
studied. Once this clarity is obtained, then better studies of test score validity can be conducted. 
These should look at both the extent to which the use of an accommodation increases the scores 
of students who need them, as well as result in better measurement of the students’ knowledge 
and skills - measurement comparable to that obtained for students without disabilities. 

Studies are also needed that explore the desirability and perceived usefulness of accommoda- 
tions by students themselves - the “end users” of assessments. Several researchers cited a lack 
of information about the individual students who used accommodations as a limitation; in fact, 
they often expressed frustration about not really knowing whether individual students actually 
needed the accommodations that were provided to them. Research in which random or inadequate 
decisions are made about who should use an accommodation, or research in which students are 
using an accommodation for the first time, may result in limited validity. Researchers also need 
to consider the implications of multiple accommodation use. Most students use a combination 
of accommodations during an assessment (BieUnski, Ysseldyke, Bolt, Friedebach & Friedebach, 
2001; Brown & Augustine, 2001; Elliott, Bielinski, Thurlow, DeVito, & Hedlund, 1999) and 
may not do well if only one accommodation is provided for research situations. 

Almost half of the studies used state-level criterion-referenced tests or performance assessments, 
with fewer using norm-referenced or school-designed tests. Because of the importance of the 
use of criterion-referenced or standards-based tests for accountability purposes (as required by 
the No Child Left Behind Act of 2001), accommodations research using these tests continues to 
be the most relevant for states. In addition, the majority of the studies used assessments that ad- 
dressed the content areas of mathematics and English language arts. Since NCLB requires states 
to assess students in science by 2007-2008, increased research in science will be important. 
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Over one third of the studies focused on the accommodation of extended time; in fact the major- 
ity of accommodations research over the past several years examined extended time. Extended 
time is an issue for students who are taking norm-referenced tests, which have traditionally 
been timed tests; however, since most state tests are criterion-referenced and do not have time 
Umitations, and since this research has fairly consistently concluded that extended time helps 
students with disabiUties, it is time to move on and do less research on extended time and more 
on other, more controversial and highly used accommodations. For example, oral test admin- 
istration is a very important and controversial accommodation for students with a variety of 
disabiUties and needs to continue. 

Another growing concern is the use of computer-based testing, not just as an accommodation, but 
for all students. The advent of computer-based testing will bring new challenges for students with 
disabiUties. Research in this area has begun, but will be heightened in intensity and importance 
as these assessments are developed and used across states (Thompson, Thurlow, Quenemoen, 
& Lehr, 2002). Research is specificaUy needed on the use of several features that do not apply 
to paper/pencil tests, such as familiarity with computer use, screen navigation, screen readers 
(a variation of oral presentation), and the use of speech recognition software. 

There are several considerations for future research in the selection of the participant sample. 
First is the number of students needed for an adequate sample. The number of participants in the 
studies examined in this paper ranged from less than one hundred to several thousand. Though 
the optimal number of subjects varies somewhat with the research design, several researchers 
cited an inadequate sample size as a Umitation of their results. A second consideration is the 
percent of the sample that consists of students with disabiUties. This also varied greatly across 
studies - from less than 25 % of the sample to 100%, and in eight studies the percentage was 
unknown. It is important to have approximately at least as many students with disabiUties as 
without, especiaUy in studies using research designs that compare the effects of accommoda- 
tions between the two groups. 

In addition to the percent of students with and without disabiUties, it is important to have in- 
formation about the specific disabiUties experienced by study participants. Most of the studies 
examined students with learning or other cognitive deficits. Although it is important to focus 
research on the largest number of students affected by accommodations use, additional research 
is needed on accommodations use by students with visual, hearing, and physical disabiUties. 
These students are smaUer in number than those with learning disabiUties, but often have very 
complex accommodation needs, including Braille, sign language interpretation, and assistive 
technology. 

FinaUy, in looking at participant grade levels, it was noted that the majority of studies examined 
students at the elementary and middle school levels, with very few at the high School level. The 
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greatest number of studies took place at the fourth grade level. Students at this age may not 
have command of many accommodations - factors such as inexperience with accommodations 
and large-scale tests may affect research results. 

Over one third of the studies (17) applied non-experimental research designs rather than reflect- 
ing one of the group or single subject designs described in Thurlow, et al. (2000). More rigorous 
research, using designs comparing scores and interactions between the presence and absence 
of a disability are needed in the future. 

Given the increased emphasis placed on scientifically-based findings, the need for more ex- 
perimental designs is obvious. These designs allow for clearer discrimination of the effects of 
accommodations, including the isolation of the effect of specific accommodations. Although, 
experimental designs are best, it is not always an option to employ these designs. The benefits 
of non-experimental research are: (1) large sample sizes, sometimes even aU students in a state, 
and (2) real-world testing situations (i.e., results should reflect what actually takes place in real- 
world testing). Within the domain of accommodations research, non-experimental research can 
play a vital role - addressing the question of comparability in a way not possible with most 
experimental studies. 

The results of the research examined in this paper vary, but show some consistencies that are 
worth noting. First of aU, three accommodations showed a positive effect on student test scores 
across at least four studies: computer administration, oral presentation, and extended time. 
However, additional studies on each of these accommodations also found no significant ef- 
fect on scores or alterations in item comparability. AU of the meta analyses of accommodated 
conditions found a positive effect on scores, and aU of the studies examining differential item 
functioning (DIF) under accommodated conditions found some items that exhibited DIF. Almost 
all DIF studies expose a few DIF items, whether the comparison is between males and females, 
different ethnic groups, or accommodated and non- accommodated conditions. So, little DIF 
usuaUy does not pose a problem, but how many DIF findings are “little”? Clarification of these 
standards is needed. 

Another common finding was that accommodation decisions are based primarily on educator 
beliefs. The inconsistencies in these decisions have been weU-documented (Fuchs, Fuchs, Eaton, 
Hamlett, & Kams, 2000a; Schulte, Elliott, & KratochwiU, 2000). Identifying specific ways to 
improve these decisions and to verify them is clearly needed. 

The primary recommendation made by researchers for future studies on accommodations is 
further replication of previously conducted research for increased validation and generalization, 
with consideration of the other recommendations previously presented in this discussion. It wUl be 
important to address the limitations cited repeatedly by researchers: unknown variations between 
students included in the study, sample sizes too smaU to provide adequate statistical support, 
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and nonstandard administration of the accommodations across proctors and schools. Some of 
these hmitations are simply the result of the difficulties inherent in this type of research. 

This summary is intended to provide direction to the design of critically needed future research 
on accommodations use. We are beginning to explore the notion that tests can be designed from 
the beginning to be better for everyone. Future research should also explore the effects of assess- 
ment design and standardization to see whether incorporating new item designs and incorpo- 
rating more flexible testing conditions reduces the need for accommodations while facihtating 
measurement of the critical constructs for students with disabihties. It is possible that through 
implementation of the principles of universal design (Thompson, Johnstone, & Thurlow, 2002), 
the need for accommodations will decrease, and the measurement of what students know and 
can perform will improve for all students. 
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